Breast cancer, viruses, and human leukocyte antigen (HLA)

Several viruses have been implicated in breast cancer, including human herpes virus 4 (HHV4), human herpes virus 5 (HHV5), human papilloma virus (HPV), human JC polyoma virus (JCV), human endogenous retrovirus group K (HERVK), bovine leukemia virus (BLV) and mouse mammary tumor virus (MMTV). Human leukocyte antigen (HLA) is involved in virus elimination and has been shown to influence breast cancer protection/susceptibility. Here we investigated the hypothesis that the contribution of a virus to development of breast cancer would depend on the presence of the virus, which, in turn, would be inversely related to the success of its elimination. For that purpose, we estimated in silico predicted binding affinities (PBA) of proteins of the 7 viruses above to 127 common HLA alleles (69 Class I [HLA-I] and 58 Class II HLA-II]) and investigated the association of these binding affinities to the breast cancer—HLA (BC-HLA) immunogenetic profile of the same alleles. Using hierarchical tree clustering, we found that, for HLA-I, viruses BLV, JCV and MMTV were grouped with the BC-HLA, whereas, for HLA-II, viruses BLV, HERVK, HPV, JCV, and MMTV were grouped with BC-HLA. Finally, for both HLA classes, the average PBAs of the viruses grouped with the BC-HLA profile were significantly lower than those of the other, non BC-HLA associated viruses. Assuming that low PBAs are likely associated with slower viral elimination, these findings support the hypothesis that a defective/slower elimination and, hence, longer persistence and inefficient/delayed production of antibodies against them underlies the observed association of the low-PBA group with breast cancer.

Breast cancer is the most common cancer affecting women worldwide 1 .Several risk factors have been identified, including family history of breast cancer, dense breast tissue, female reproductive factors, alcohol or tobacco use, body mass index, and genetics (e.g.BRCA gene) 2,3 ; however, nearly half of breast cancers develop in women in the absence of these risk factors 3 , suggesting that additional factors likely contribute to breast cancer risk.
The role of viruses in breast cancers is increasingly recognized and is likely underestimated 4 .As documented elsewhere, human herpes viruses (HHV) including Epstein Barr virus (EBV, HHV4) and cytomegalovirus (CMV, HHV5), mouse mammary tumor virus (MMTV), high risk human papilloma virus (HPVs), bovine leukemia virus (BLV), human polyomavirus JC virus (JCV), and human endogenous retrovirus K (HERVK) have been implicated in human breast cancer [4][5][6][7][8][9] .Several viruses (e.g., MMTV, HPV, EVB, and BLV) have been identified and shown to co-exist in human breast cancer cells [10][11][12] , and in benign breast biopsies 1-11 years before developing cancer 11 .They have also been identified in normal breast tissue samples and in milk of normal lactating women, albeit to a lesser extent 10,11 .Indeed, viruses linked to human cancers are ubiquitous yet only a small proportion of infected individuals develop cancer, one of many reasons that have made it challenging to identify causal relations between viruses and cancer 13 .One factor that may moderate the association between viruses and breast cancer is variation in host immunogenetics related to human leukocyte antigen (HLA).
HLA genes, located on chromosome 6, code for two main classes of cell-surface proteins involved in the immune response to foreign antigens including viruses and cancer neoantigens 14,15 .HLA-I molecules of the classical genes A, B and C are expressed on all nucleated cells, bind and present small peptides (8-10 amino acid residues 16 ) from proteolytically degraded foreign antigens to CD8 + cytotoxic T cells, signaling cell destruction.HLA-II molecules of the DPB1, DQB1 and DRB1 genes are expressed on lymphocytes and professional antigen presenting cells, present larger peptides (12-22 amino acid residues 17 ) derived from endocytosed exogenous antigens to CD4 + T cells, facilitating antibody production and adaptive immunity.Each individual carries two www.nature.com/scientificreports/ of each HLA gene, for a total of 12 classical HLA alleles.The HLA region is the most highly polymorphic region of the human genome 18 , with most of the variation existing in the binding groove.This variation amounts to tremendous individual variability in the ability to bind and eliminate viruses and other foreign antigens.Specific HLA alleles have been associated with breast cancer protection or susceptibility [19][20][21][22][23][24][25][26][27][28] .This association is captured in the breast cancer-HLA immunogenetic profile which contains the correlations between the prevalence of breast cancer and HLA allele frequency 19 .Given the documented involvement of several viruses in breast cancer, discussed above, we investigated, in this study, the possible viral elimination by the HLA system, as a mechanism of preventing the oncogenic effect of those viruses.More specifically, we focused on 7 viruses that have been found in breast cancer tissue (HHV4, HHV5, HPV, JCV, MMTV, BLV, HERVK) and estimated in silico their binding affinity with respect to 69 common HLA-I alleles of the 3 classical genes (A, B, C) and 58 common HLA-II alleles of the 3 classical genes (DPB1, DQB1, DRB1).Since binding affinity is a critical initial step in foreign antigen elimination, it is reasonable to assume that high binding affinity would be more effective in virus elimination, and vice versa for low binding affinity.Thus the objectives of this study were (a) to estimate in silico the predicted binding affinity of specific viruses with respect to specific alleles using the Immune Epitope Database (IEDB) NetMHCpan (ver.4.1) tool 29,30 , (b) to identify those viruses whose binding affinities were associated with the breast cancer-HLA immunogenetic profile, and (c) to test the hypothesis that the predicted binding affinity of this set of viruses is lower than that of the viruses unassociated with the breast cancer-HLA profile.
The overall design of our analyses is depicted in the schematic diagram of Fig. 1.Details of the analyses are provided in each of the sections to follow.

Effect of virus and HLA class on PBA
The effects on PBA of Virus, HLA Class, and their interaction were evaluated using a repeated-measures analysis of variance (ANOVA), where the 7 viruses comprised the "Within-Subjects" Virus factor and the 2 HLA classes comprised the "Between-Subjects" fixed Class factor.We found the following: (a) The effect of Virus was highly significant (P < 0.001, Greenhouse-Geisser test), with JCV and BLV having lower average PBA scores (Fig. 2); (b) The effect of HLA Class was also highly significant (P < 0.001, F-test), with HLA-I having 2.5 × higher scores than HLA-II (Fig. 3A); and (c) the Virus x Class interaction term was also highly significant (P < 0.001, Greenhouse-Geisser test) (Fig. 3B).This interaction seems to be due mainly to the fact that the PBA scores for JCV and BLV viruses are disproportionately lower in HLA-II as compared with HLA-I and are substantially lower than the other viruses.

Effect of HLA-I and HLA-II genes on PBA
Given the significant Virus × HLA Class interaction above, the effect of Virus and Gene on PBA were evaluated separately for each HLA class using 2 separate repeated measures ANOVAs, one for each HLA Class, where Virus was the Within-Subjects factor as above, and Gene was the Between-Subjects factor comprising the 3 genes of HLA-I (A, B, C) and the 3 genes of HLA-II (DPB1, DQB1, DRB1).These analyses also evaluated the effect of Virus within each HLA Class separately.We found the following: (a) There was a significant effect of Virus (P < 0.001, Greenhouse-Geisser test) for both HLA-I (Fig. 4, left panel) and HLA-II (Fig. 4, right panel); (b) for HLA-I, there was a marginally significant effect of Gene (P = 0.024, F-test), with higher PBA values for gene C (Fig. 5A), whereas the Virus × Gene effect was not statistically significant (P = 0.166, Greenhouse-Geisser test) (Fig. 5B); (c) For HLA-II, there was a significant effect of Gene (P = 0.011, F-test), with higher PBA values for gene DQB1 (Fig. 6A), and the Virus × Gene effect was highly significant (P < 0.001, Greenhouse-Geisser test) (Fig. 6B).

Association between PBA and breast cancer: HLA immunogenetic profile
As depicted in the schematic diagram of Fig. 1, our analyses culminated in the hierarchical tree clustering which we applied to the data of HLA-I and HLA-II shown in Tables 2 and 3, respectively.This analysis yielded 2 dendrograms, one for each class.In both cases, there were 2 clusters, as follows.For HLA-I, BC-HLA immunogenetic scores were grouped with BLV, JCV and MMTV (Fig. 7A).Remarkably, the average PBA scores of the viruses in this BC-associated group (red in Fig. 7A,B) were significantly lower than those in the other, non-BC group (blue in Fig. 7A,B) (P < 0.001, paired-sample t-test).For HLA-II, BC-HLA immunogenetic scores were grouped on a sub-branch with JCV and BLV; MMTV, HERV-K, and HPV were grouped on the other subbranch.As with HLA-I, the average PBA scores of the 5 viruses in the BC-associated group (red in Fig. 8A,B) were significantly lower than those in other non-BC group (P = 0.038, paired samples t-test).Altogether, these results document the grouping of PBA of certain viruses with BC-HLA immunogenetic profile, and their lower predicted binding affinity, as compared to the group of viruses not grouped with BC-HLA.

Discussion
In light of separate lines of evidence linking both HLA and viruses to breast cancer, we first evaluated the predicted binding affinity of specific viruses implicated in breast cancer with regard to 127 common HLA alleles and then examined the associations between those viral protein binding affinities with a population-derived breast cancer-HLA profile 19 .With regard to the former, the overall results documented variation in HLA-I and HLA-II mediated immunity to viral proteins implicated in breast cancer that varied by virus, HLA gene, and across alleles within each gene.Specifically, our findings documented (a) higher predicting binding affinities of HLA-I alleles (than those of HLA-II), (b) higher binding affinities of gene C of HLA-I and gene DQB1 of HLA-II, and (c) overall lower binding affinities for JCV in both HLA-I and HLA-II.With respect to viruses, it is worth noting that all 7 viruses investigated here have been implicated in breast cancer [4][5][6][7][8][9] .This study focused on immunogenetic aspects of these viruses, both with respect to their predicted binding affinities to HLA-I and II alleles and their grouping with breast cancer-HLA immunogenetic profile.A major finding of the latter analysis was the grouping of specific viruses with breast cancer (shaded red in Figs.7A  and 8A), fewer for HLA-I (3/7 viruses, Fig. 7A) than HLA-II (5/7 viruses, Fig. 8A).This grouping enabled us to test the hypothesis that this association of specific viruses with breast cancer immunogenetics may be due to lower virus binding affinity to HLA molecules, thus delaying the elimination of virus directly (via HLA-I-CD8 + engagement leading to death of the infected cell) and/or indirectly (via HLA-II-CD4 + engagement leading to antibody production).Indeed, this was found to be the case for both HLA Classes (Figs. 7B, 8B).It is worth pointing out that the present findings do not preclude possible involvement of viruses in breast cancer via other mechanisms that remain to be identified and investigated.
In summary, the present study provides a novel contribution implicating HLA-mediated virus immunogenicity on breast cancer.Still several limitations must be considered.First, the analyses included 127 common HLA-I and II alleles and 7 viruses.The HLA region, however, is highly polymorphic and the binding affinities of the viruses with other less common HLA alleles was not investigated.Second, although the present analyses focused on 7 viruses that have previously been implicated in breast cancer, other viruses not included here may be involved in breast cancer.Third, we evaluated the binding affinities of HLA molecules to representative proteins of the 7 viruses, all of which are involved with viral entry into the host cell; still, other proteins may have different binding affinities.For example, several hundred types of HPV exist, several of which are associated with high risk for cancer 31 ; here, we evaluated HPV16, one of the most common types of HPV involved in cancer risk 32 , yet other types of HPV may have different binding affinities.Finally, the breast cancer-HLA profile was based on population prevalence of breast cancer in general 19 ; specific types of breast cancer may have a different HLA profile.Despite these limitations, the current findings provide novel insights regarding the interaction of virus exposure and host immunogenetics with regard to breast cancer.2 and  3); 6, N = 127 HLA alleles × 7 viruses = 889 HLA-I (Table 2) and HLA-II (Table 3) estimated binding affinities.

HLA alleles
We obtained the population frequency in 2019 of 69 common HLA-I alleles and 58 common HLA-II alleles that occurred in at least 9 of 14 Continental Western European Countries (Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland) at frequencies ≥ 0.01, as described previously 33 .

Breast cancer: HLA protection/susceptibility (P/S) scores
These scores are correlations (Fisher z-transformed) between the prevalence of breast cancer in the 14 countries above and the population frequency of each one of the 127 HLA alleles in the same countries.The scores have been published 19 and are given in Tables 2 and 3.

Virus proteins
We estimated in silico the predicted binding affinities (for each one of the 127 HLA alleles) of proteins of 7 viruses that have been implicated in breast cancer, namely HHV4, HHV5, HPV, JCV, HERVK, BLV, and MMTV.Details of the proteins analyzed are given in Table 1 and their amino acid (AA) sequences are given in the Appendix (Supplementary Materials).

In silico determination of predicted binding affinity of HLA-I and HLA-II alleles
Predicted binding affinities were obtained for viral protein epitopes using the Immune Epitope Database (IEDB) NetMHCpan (ver.4.1) tool 29,30 .More specifically, we used the sliding window approach [34][35][36] to test exhaustively all possible linear 9-mer (for HLA-I predictions) and 15-mer (for HLA-II predictions) AA residue epitopes of the 7 viral proteins analyzed (Table 1).The method is illustrated in Figs. 9 and 10 for the JCV virus protein.For each epitope-HLA molecule tested, this tool gives, as an output, the percentile rank of binding affinity of the HLA molecule and the epitope among predicted binding affinities of the same HLA molecule to a large number  of different peptides of the same AA length; the smaller the percentile rank, the better the binding affinity.Now, given a protein of N amino acid length and an epitope length of k AA, there are N-k binding affinity predictions, i.e.N-k percentile ranks.Of these predictions, for each viral protein and HLA molecule tested, we retained the lowest percentile rank (LPR) as the best possible binding affinity of the protein-HLA molecule pair.We then applied two transformations on LPR.First, we took its inverse, so that higher values mean better binding affinities for more intuitive interpretation: (1) LPR ′ = 1 LPR  The LPR ′ distribution was heavily skewed to the left (Fig. 11A), resembling a exponential distribution and deviating substantially from a normal distribution (Fig. 11B).Therefore, LPR ′ values were (natural) log trans- formed to normalize the distribution for quantitative analyses (Fig. 12A,B

Figure 2 .
Figure 2. Mean (± SEM) predicted binding affinities of the 7 viral proteins used across all 127 HLA alleles.

Figure 4 .
Figure 4. Mean (± SEM) predicted binding affinities for HLA-I and HLA-II for each virus studied.

Figure 7 .
Figure 7. Hierarchical tree clustering results for HLA-I and BC-HLA profile.(A) Dendrogram of the 7 viruses' predicted binding affinities and BC-HLA profile.(B) Mean (± SEM) of predicted binding affinities of the viruses in the two color-coded groups of the dendrogram.N = 69 HLA-I alleles.See text for details.

Figure 8 .
Figure 8. Hierarchical tree clustering results for HLA-II and BC-HLA profile.(A) Dendrogram of the 7 viruses' predicted binding affinities and BC-HLA profile.(B) Mean (± SEM) of predicted binding affinities of the viruses in the two color-coded groups of the dendrogram.N = 58 HLA-II alleles.See text for details.

( 2 )Figure 9 .
Figure 9. Illustration of the 9-mer sliding window approach for the in silico estimation of the predicted binding activity for HLA-I alleles.

Figure 10 .
Figure 10.Illustration of the 15-mer sliding window approach for the in silico estimation of the predicted binding activity for HLA-II alleles.

Figure 11 .
Figure 11.(A) frequency distribution of raw (untransformed) predicted binding affinities ( LPR ′ ) to illustrate their deviation from a symmetric distribution.(B) Probability-probability plot of the data in (A).Data from JCV virus.

Figure 12 .
Figure 12. (A) frequency distribution of raw (untransformed) predicted binding affinities ( LPR ′ ) to illustrate their deviation from a symmetric distribution.(B) Probability-probability plot of the data in (A).Data from JCV virus.

Table 3 .
Predicted binding affinities (PBA, see "Methods") and Breast Cancer-HLA P/S scores 17 for all 58 HLA-II alleles and 7 viruses studied.